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Abstract — In this paper, we propose an agreement-based fault 
detection and recovery protocol for cluster head (CH) in 
wireless sensor networks (WSNs) of two level cluster 
hierarchy. The aim of protocol is to accurately detect CH 
failure to avoid unnecessary energy consumption caused by a 
mistaken detection process. For this, it allows each cluster 
member to detect its CH failure independently. Cluster 
members employ distributed agreement protocol to reach an 
agreement on failure of the CH among multiple cluster 
members. The detection process runs concurrently with 
normal network operation by periodically performing a 
distributed detection process at each cluster member To 
reduce energy consumption, it makes use of heartbeat 
messages sent periodically by a CH for fault detection. 
Simulation results show, our protocol provides high detection 
accuracy because of agreement protocol. 

Keywords — Wireless Sensor Network, Clustering, Fault 
detection, Agreement protocol, Detection accuracy. 

I. Introduction 

Wireless sensor networks (WSNs) consist of hundreds 
and even thousands of small tiny devices called sensor 
nodes distributed autonomously to monitor physical 
/environmental conditions (like temperature, sound, 
vibration, pressure etc); motion at different locations; 
industrial sensing, infrastructure protection, battlefield 
awareness etc. Each sensor node has sensing, computation, 
and wireless communication capabilities [1]. Sensor nodes 
sense the data and send it to base station (BS). Sensor 
nodes are small in size powered by small onboard batteries 
that store few Joules. Sensor nodes are often left 
unattended which makes it difficult or impossible to re- 
charge or replace their batteries. The cost of transmitting 
information is much higher than computation and hence it 
is necessary to reduce the number of transmissions. 

In many situations, sensor nodes are organized into 
clusters where data collected by sensor nodes is sent to 
local cluster BS (e.g. CH). CH processes this data and 
sends it to the BS. Clustering is an effective way to reduce 
the number of transmissions and prolongs the life time of a 
network. The CH processes the data collected from all 
cluster members and transmits towards BS after suitable 
processing. Due to this, CH drains energy much faster than 
cluster members. The role of CH must be rotated among 
cluster members to prolong the life time of the network. 
There are number of clustering-based routing protocols 
proposed in literature for WSNs [2]. These protocols 
improve energy consumption and performance when 
compared to flat large-scale WSNs, but they also increase 
the overhead to configure and maintain the network. 



Sensor nodes are prone to failure due to harsh 
environment. The failure of a sensor node affects the 
normal operation of a WSN [3]. The failure of a CH makes 
situation even worse. In literature, number of authors have 
proposed fault tolerant protocols [4-7]. In this paper, we 
propose a fault tolerant protocol for WSN, which is based 
on agreement protocol. 

II. Related Work 

Clustering is an effective way for improving the energy 
efficiency and prolonging the network lifetime of WSNs. 
The CH failure causes the connectivity and data loss within 
cluster. It also disconnects cluster members from rest of the 
network. Hence, it is crucial to detect and recover the CH 
failure to maintain normal operation of cluster and network 
as a whole. 

Bandyopadhyay et al. [8] proposed a multi level 
clustering scheme in multi hop fashion. It derives 
probability of becoming a CH that minimizes energy 
dissipation. These probability functions are highly complex 
and thus require numerical optimizations. It also gives the 
concept of forced CHs i.e. if a node does not fall within the 
range of any CH, it becomes a CH itself. Periodical run of 
clustering algorithm for load balancing is used here also. 

In REED (Robust Energy Efficient Distributed 
clustering) [9], a k-fault tolerant (i.e., k-connected) network 
is constructed. In this, fault tolerance is achieved by 
selecting k independent sets of cluster heads on top of the 
physical network, so that each node can quickly switch to 
other cluster heads in case of failures or attacks on its 
current cluster head. The independent cluster head overlays 
also provide load balancing and security. In this, 
periodically re-clustering the network is done which 
consumes significant energy. Moreover, to maintain a list 
of k cluster heads list requires a lot of storage space. 
In EEMC (An Energy Efficient Multi Level Clustering) [10], CHs 
at each level are elected on the basis of probability function 
which takes into consideration the residual energy as well 
as distance factor very efficiently. In this scheme whole 
information is sent and received by sink node for cluster 
formation. Fault tolerance is provided by periodic re- 
clustering of whole network. 

In cellular approach to fault detection and recovery [11], 
network is partitioned into a virtual grid of cells, where 
each cell consists of a group of sensor nodes. A cell 
manager and a secondary manager are chosen in each cell 
to perform fault management tasks. Secondary manager 
works as back up node which will take control of the cell 
when cell manager fails to operate. This protocol handles 
only those failures which are caused by energy depletion. 
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FTEP [12] is a dynamic and distributed CH election 
algorithm with fault tolerance capabilities based upon two- 
level clustering scheme. If energy level of current CH falls 
below a threshold value or any CH fails to communicate 
with cluster members then election process is started which 
is based on residual energy of sensor nodes. This election 
process appoints a CH and a back up node to handle CH 
failure. It has a single point (back up node) to detect failure 
which may itself be disastrous. 

III. System Model 

In this paper we extended our previous work [13] for 
fault detection and recovery protocol for two -level 
clustering. 

A. Network Model 

Figure 1 shows the two-level clustering network model 
that used. Various symbols and terms used are shown in 
Table I. All sensor nodes are homogeneous, which have 
two transmission modes i.e. high power transmission mode 
for communication between CHs and BS and low power 
transmission mode for communication between cluster 
members and CH. The distribution of sensor nodes is 
uniform throughout the environment. Communication 
medium is radio links. Links between two sensor nodes is 
considered bidirectional. There is only single channel for 
communication between sensor nodes. 




Level-2 CH 
Level -1CH 

Common Node 



Figure 1 Network Model 



During the network deployment, all the sensor nodes 
are assigned same initial energy value. All sensor nodes are 
assumed to know their geographical location [14]. We 
assume that clusters may overlap during election procedure 
so that every sensor node comes under at least one cluster. 
Initially, some sensor nodes are randomly selected as CHs 
and they announce their energy levels and location 
information. These CHs start working in high power 
transmission mode while other regular sensor nodes work 
in low power transmission mode. 



B. Sensor Node 's Energy Model 

A sensor node consists of sensors, analog signal 
conditioning, data conversion circuitry, digital signal 
processing and a radio link. Each component of sensor 
node consumes energy for sending and receiving data. The 
following energy consumption model shows the energy, 
consumed by components of sensor node. 
Assuming path loss, the energy consumption on each 
sensor node is: 




According to eq. 1, the transmitter unit consumes 
energy to send bits; where e_tx the energy is consumed 
by transmitter electronics per bit and is the energy 

used by amplifier per bit. According to eq. 2, the receiving 
unit consumes energy to receive bits, where is the 
energy used by receiver electronics per bit. 

TABLE I 

Notations used in Paper 



d 


Distance that message travels 


i 


Number of bits in the message 


£... 


Energy dissipated in transmitter electronics per bit 
(taken to be 50nJ/bit) 


£ 


Energy dissipated in transmitter amplifier (taken to 
be 50nJ/bit) 


s. .. 


Energy dissipated in receiver electronics per bit 
(taken to be 50nJ/bit) 


c 


Energy consumed in transmission 




Energy consumed in receiving 


57 


Status vector 


L7Cj 


Location of node 


OS 


Cluster head of cluster 


f 


Cluster 


£-~" 


Current energy of node 


£~ — 


Energy level at which sensor node can participant 
in election of at level- 1 


t~ "' 


Energy level at which current starts election 
process at level- 1 


£ r ■■"■ 


Energy level up to which election process must be 
completed at level- 1 


E~~* 


Energy level at which sensor node can participant 
in election of at level-2 


C~-~ 


Energy level at which current starts election 
process at level-2 


£~~ 


Energy level up to which election process must be 
completed at level-2 


tr/2 


Transmission range of node at level- 1 


rJT/fi 


Transmission range of node at level-2 



its 



Table I summarizes the meaning of each term and 

typical value. The values for 

e i'i 9™' an d & Vi are updated during each election 

process at level- 1. Typically, value of &'' for next 

election round is set to the average value of the energy 

levels of all candidate nodes during current election round. 
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The values of *v :: is set according to f i\ The values of 

^■■"'is set according to p,- : as follows: 

■ 

Fj_ - 1, = fi - (energy consumption during election 
process + energy consumption in data transmission during 
that period) 

These values of e : ", &'"'■ and e^ calculate 
similarly for cluster at level-2. 

IV. FTTCP Protocol 

FTTCP works in two phases namely: setup phase and 
steady state phase. Setup phase runs only once, when 
network starts working. In setup phase, clusters are formed 
and remain fixed through-out the lifetime of network. 
Steady state phase consists of three phases: CH election, 
failure detection and failure recovery. Failure detection 
runs parallel with network operation. 

A. Setup Phase 

Clusters are formed only once during the setup phase 
before the network starts to run (as shown in Figure 2). 
Here we explain only level-2, level- 1 explained in [13]. 
After the formation of clusters at level- 1, some CHs are 
randomly selected as a CH for level-2, because energy of 
each CH at level- 1 is equal in amount. CHs send 
advertisement messages that contain energy and location 
information of CHs to neighboring CHs (at level- 1). Each 
CH that listen to this advertisement message responds with 
a return message comprising its residual energy and 
location. However, a CH may be in the range of multiple 
CHs, but finally it must be associated with a single CH(at 
level-2). If any CH falls within the overlapping region of 
more than one CH, it decides its association to a CH by 
calculating the value of e/d (energy /distance). CH (atlevel- 
2) that has maximum e/d value is selected as final CH by 
that CH. If more than one CH yields same maximum e/d 
value, then any of them is randomly selected. If a CH does 
not fall within the range of any CH, it declares itself as a 
CH and gets activated in high power transmission mode. 
When clusters are established, the CHs (at level- 1) collect 
the data from cluster members, perform local data 
aggregation and send it to CH of level-2. This CH sends 
data to base station or sink node in multi-hop manner. 

Clusters form circle of radius size at level- 1 and at 
level-2. and size is taken to confirm that every node in 
cluster able to communicate with other nodes within a 
single-hop in same cluster. 

B. Steady State Phase 

Once cluster is formed, CH creates a TDMA schedule 
for cluster members and sends it to them at both levels. 
Cluster members sense data and send it to CH according to 
TDMA schedule. This process continues for all clusters 
until CH's current energy level equals to or less than or CH 
fails. Then CH starts election process of new CH for next 
round or recovers from failure respectively (as shown in 
Figure 3). 



any node 




Create TDMA 

Schedule & 

send it to CM 

and elect 




Figure 2 Setup Phase 




New back-up 

Call Backup node 



Figure 3 Steady State Phase 

CH Election 

CH broadcasts sf^for next round, which is 
average energy of those cluster members who participated 
in last election process. All cluster members within cluster 
listen message and compare with their current energy level 
(^p b )• cluster members which have pV ^ greater than or 
equal to t^" 7 "' , marks itself as a participant for election 
processes shown in Figure 4). All participant sensor nodes 
broadcast their and location. All participant cluster 
member can listen to each other because all cluster 
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members are within low (at level) or high (at level-2) 
power transmission range of each other. Because of this, all 
participant sensor nodes know about and of each other. 
Hence, each participant cluster member is aware about 
higher energy participant cluster member. The participant 
cluster member with highest value of promotes itself as 
CH and gets activated in high power mode (at level- 1); 
whereas cluster member with second highest energy 
upgrades itself as back up CH. New CH receives and of 
all participant cluster members during election process, it 
calculates average of all and gets value of , which is used 
for next round. Both new CH and back up node know the 
value of . All participant cluster members mark themselves 
as non-participant cluster members again. The previous CH 
also starts working in low power mode (at level- 1). 




Broadcast its and receive 
other participant node's 



highest energy node 
Backup node second highest enersv 



Elected CH sends CH 



Mark itself as non- •< 



m 



decides that CH has failed and broadcasts data plus status 
vector. Other cluster members also listen this message. 
They extract status vector from message and merge it with 
own status vector and this process continuous up to the end 
of the TDMA schedule. At the end of the TDMA frame, 
cluster members reach on an agreement about failure of 
CH. If all bits of status vector are set then it is decided that 
CH has failed. 



any node i 
Is CH? 




Set or rest Status 

Vector (SV) on the 

basis of hearing 



Send its Data+ SV 
and receive SV 
from other nodes 




Figure 4 CH Election 



Return 

EM T OE 1 



Return 



Failure Detection 

The detection process runs parallel with normal 
network operation by periodically performing a distributed 
detection process at each cluster member (as shown in 
Figure 5). For failure detection mechanism each cluster 
member maintains a status vector and a timer. In status 
vector each bit corresponds to a cluster member. Initially 
all bits are set to zero of status vector on each cluster 
member. A bit in the vector is set once its corresponding 
cluster member detects that CH has failed. CH of each 
cluster periodically sends a hello message (i.e. notification 
that CH is alive) to cluster members after a certain time 
interval. Cluster members also know about time interval, 
CH sends it to cluster members. After that time interval 
cluster member, who does not listen hello message, sets its 
corresponding bit as one in status vector and locally 
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Figure 5 CH Failure Detection 

Failure Recovery 

By using agreement protocol when cluster members 
confirm about CH then cluster member who has last slot in 
TDMA schedule informs to back up node about failure. 
Back up node elects itself as a CH by sending an 
advertisement message in high power transmission mode 
(as shown in Figure 4). It keeps on working as CH till its 
residual energy level reaches a critical limit or it fails. New 
back up node is required for new CH depending on 
application, so CH start election process for new back up 
node by sending. Back up node election process is similar 
to election process of CH. 
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V. Performance Evaluation 

A. Simulation Environment 

In this section, we evaluate the performance of our 
proposed FTTCP protocol. We used OMNET-4.0 [15] as 
simulator and same radio model as discussed in section III. 
The basic simulation parameters are given in Table II. 

TABLE II 
EXPERIMENT PARAMETERS 



Parameter 


Value 


Area of sensor field 


100x100 m 2 


Sink position 


At origin (0,0) 


Initial energy per 
node 


1 J 


Path loss exponent 


2 


e* 


50 nJ/bit 


?---.- 


100 pJ/bit/m2 


£... 


50 nJ/bit 


Size of data packet 


500 bits 


Size of control 
packet 


20 bits 


Sensing Interval 


0.5 s 


High transmission 
range 


60 m 


Low transmission 
range 


20 m 


No of Nodes 


300 


Cluster Size 


10, 20, 30 



In order to check the performance of FTTCP protocol, 
we take following metrics/clustering attributes: 

• Network lifetime: This metric gives the time up to which 
a network remains alive. It shows number of rounds 
(including fault tolerance) up to which network remains 
alive for different number of nodes in network. One 
round consists of an operation of network from sensing 
the phenomenon to receiving data at sink node 
including election process and fault handling if any. 

• CH election overhead: It is defined as energy consumed 
in electing a CH in a network. It is the energy consumed 
by total number of messages exchanged among sensor 
nodes for electing CH. 

• Detection Accuracy: It shows how accurately fault can 
be detected by nodes. The detection accuracy is defined 
by the probability of false alarm, which is the 
probability that an operational CH is mistakenly 
detected as a faulty one. Detection accuracy 
performance is measured under different packets loss 
rates and cluster sizes. 

B. Simulation Results and discussion 

To find out more reliable and accurate results, we 
executed FTTCP protocol with different number of nodes, 
number of times and failure frequency. 

Network lifetime 



It can be observed form Figure 6 that as the number of 
nodes increases, network lifetime increases. But after 
certain number of nodes, the network life time starts 
decreasing due to more overhead of cluster maintenance. 
FTTCP consumes more energy in failure detection and 
recovery as compare to FTEP. Thus, it reduces average 
0.42% number of rounds as compare to FTEP. When 
number of nodes are 100, network is alive up to 860 
rounds. 



_g 800 

E 




100 150 

Number of Nodes 



Figure 6 Network Lifetime 



Detection Accuracy 

From Figure 7, we can observe the effects of the packet 
loss rate on detection accuracy for different cluster size. 
For simulation, we consider the packet loss rate range from 
0.2 to 0.4. It can be observed that with the increase of the 
packet loss rate the probability of false alarm positive 
increases, which leads to lower detection accuracy. A 
larger number of sensor nodes lead to a smaller probability 
of false alarm positive, i.e., higher detection accuracy. As 
expected FTTCP can achieve high detection accuracy. 

CH election overhead 

When number of nodes are 200, node failure frequency 
is 1% after every 50s for FTEP and FTTCP. It can be 
observed from Figure 8 that FTTCP consumes slightly 
more energy (average energy consumption 0.64%) for CH 
failure recovery as compared to FTEP. This is because of 
similar to FTEP, FTTCP elects back up node as new CH 
and also elects new back up node for new CH which results 
into more number of messages exchanged. In FTEP, back 
up node is not elected for new CH. 




0.25 0.30 0.35 

Packet Loss Rate 
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Figure 7 Detection Accuracy 
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Figure 8 CH Election overhead 

VI. Conclusion 

FTTCP is agreement-based fault detection and recovery 
protocol for faulty CH for two level clustering in WSNs. 
FTTCP periodically checks for CH failure. This detection 
process runs parallel with network operation. It provides 
high accuracy, because it allows each cluster member to 
detect its faulty CH independently. It employs a distributed 
agreement protocol to reach an agreement on the failure of 
CH among multiple cluster members. In order to recover 
from faulty CH, back up node is elected as new CH and 
new back up node is elected locally. Election of CH and 
back up node is based on residual energy of sensor nodes. 
A simulation result show, however, FTTCP consumes little 
bit more energy than FTEP, but provides high detection 
accuracy in harsh environment. 
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